1 Drawing your first plot

To get a first feel for ggplot2, let’s try to run some basic ggplot2 commands. The mtcars dataset contains information on 32 cars from a 1973 issue of Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

  • Load the ggplot2 package using library().

  • Use str() to explore the structure of the mtcars dataset.

  • Hit Submit Answer. This will execute the example code on the right. See if you can understand what ggplot does with the data.

# Load the ggplot2 package
library(ggplot2)

# Explore the mtcars data frame with str()
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
# Execute the following command
ggplot(mtcars, aes(cyl, mpg)) +
  geom_point()

1.1 Data columns types affect plot types

The plot from the previous exercise wasn’t really satisfying. Although cyl (the number of cylinders) is categorical, you probably noticed that it is classified as numeric in mtcars. This is really misleading because the representation in the plot doesn’t match the actual data type. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.

  • Change the ggplot() command by wrapping factor() around cyl.

  • Hit Submit Answer and see if the resulting plot is better this time.

# Load the ggplot2 package
library(ggplot2)

# Change the command below so that cyl is treated as factor
ggplot(mtcars, aes(factor(cyl), mpg)) +
  geom_point()

2 Mapping data columns to aesthetics

Let’s dive a little deeper into the three main topics in this course: The data, aesthetics, and geom layers. We’ll get to making pretty plots in the last chapter with the themes layer.

We’ll continue working on the 32 cars in the mtcars data frame.

Consider how the examples and concepts we discuss throughout these courses apply to your own data-sets!

  • Add a color aesthetic mapped to the displacement of the car engine: inside aes(), add a color argument equal to disp.
# Edit to add a color aesthetic mapped to disp
ggplot(mtcars, aes(wt, mpg, color = disp))+
  geom_point()

  • This time, map disp to the size aesthetic.
# Change the color aesthetic to a size aesthetic
ggplot(mtcars, aes(wt, mpg, size = disp)) +
  geom_point()

2.1 Understanding variables

In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale.

Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points. However, if you try this command in the console to the right:

ggplot(mtcars, aes(wt, mpg, shape = disp)) +
  geom_point()

It gives an error. What does this mean?

  • Erro: A continuous variable can not be mapped to shape

3 ggplot2 layers

g <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_jitter(alpha = 0.5)
g

g +
  theme_classic()

3.1 Adding geometries

The diamonds dataset contains details of 1,000 diamonds. Among the variables included are carat (a measurement of the diamond’s size) and price.

You’ll use two common geom layer functions:

As you saw previously, these are added using the + operator.

ggplot(data, aes(x, y)) +
  geom_*()

Where * is the specific geometry needed.

  • Explore the diamonds data frame with the str() function.
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
  • Edit the plot code to add a point geom. Use the + operator to add geom_point() to the ggplot() command.
# Add geom_point() with +
ggplot(diamonds, aes(carat, price)) +
  geom_point()

  • Add a smooth geom to the plot. Use the + operator to add geom_smooth().
# Add geom_point() with +
ggplot(diamonds, aes(carat, price)) +
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

3.2 Changing one geom or every geom

If you have multiple geoms, then mapping an aesthetic to data variable inside the call to ggplot() will change all the geoms. It is also possible to make changes to individual geoms by passing arguments to the geom_*() functions.

geom_point() has an alpha argument that controls the opacity of the points. A value of 1 (the default) means that the points are totally opaque; a value of 0 means the points are totally transparent (and therefore invisible). Values in between specify transparency.

The plot you drew last time is provided in the script.

  • Edit the plot code to map the color aesthetic to the clarity data variable.
# Map the color aesthetic to clarity
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

# Make the points 40% opaque
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point(alpha = 0.4) +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

3.3 Saving plots as variables

Plots can be saved as variables, which can be added two later on using the + operator. This is really useful if you want to make multiple related plots from a common base.

  • Using the diamonds dataset, plot the price (y-axis) versus the carat (x-axis), assigning to plt_price_vs_carat.

  • Using geom_point(), add a point layer to plt_price_vs_carat.

# Draw a ggplot
plt_price_vs_carat <- ggplot(
  # Use the diamonds dataset
  diamonds,
  # For the aesthetics, map x to carat and y to price
  aes(x = carat, y = price)
)

# Add a point layer to plt_price_vs_carat
plt_price_vs_carat + 
  geom_point()

  • Add an alpha argument to the point layer to make the points 20% opaque, assigning to plt_price_vs_carat_transparent.

  • Type the plot’s variable name (plt_price_vs_carat_transparent) to display it.

# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))

# Edit this to make points 20% opaque: plt_price_vs_carat_transparent
plt_price_vs_carat_transparent <- plt_price_vs_carat + geom_point(alpha = 0.2)

# See the plot
plt_price_vs_carat_transparent

  • Inside geom_point(), call aes() and map color to clarity, assigning to plt_price_vs_carat_by_clarity.

  • Type the plot’s variable name (plt_price_vs_carat_by_clarity) to display it.

# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))

# Edit this to map color to clarity,
# Assign the updated plot to a new object
plt_price_vs_carat_by_clarity <- plt_price_vs_carat + 
  geom_point(aes(color = clarity))

# See the plot
plt_price_vs_carat_by_clarity